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.  MULTIPLE  DECISION-MAKER  PROBLEMS  WITH  UNKNOWN  PARAMETERS 


The  problem  of  strategic  decision  making  in  complex  systems  which 

involve  multiple  decision  makers  (DM's),  multiple  objectives,  and  incomplete 

information  arises  frequently  in  the  military  context,  and  in  particular  in 

3 

the  Command,  Control,  and  Communications  (C  )  systems  field.  As  compared 
with  single  DM  problems,  the  analysis  of  multiple  DM  problems  requires 
different  approaches  and  techniques,  and  furthermore  certain  standard  features 
and  properties  we  usually  ascribe  to  single  DM  problems  do  not  generally 
extend  naturally  to  multiple  decision  making.  For  example,  while,  in 
single  DM  problems,  optimization  (minimization  or  maximization)  of  a  single 
objective  functional  would,  in  general,  lead  to  a  satisfactory  decision  policy 
(the  so-called  optimal  policy),  when  the  decision  problem  involves  multiple  DM's 
and  multiple  objectives  a  plethora  of  possibilities  emerge  as  to  the  criterion 
which  leads  to  a  "satisfying"  set  of  policies.  Depending  on  the  number  of  DM's, 
their  underlying  goals,  and  the  presence  or  absence  of  dominance  in  the  decision 
making  process,  we  may  have  team-optimal ,  person-by -person  optimal,  Pareto  optimal 
Pash  equilibrium,  Stackelbera  (leader-follower)  equilibrium  concepts,  and  several 
variants  of  combinations  of  these  in  case  of  more  than  two  DM's.  Each  of  these 
in  general  leads  to  a  different  outcome  which  is  alsc  a  variant  of  the  information 
structure  of  the  problem  (i.e.,  what  each  DM  knows  a  oriori,  what  information  he 
acquires  during  the  evolution  of  the  decision  process,  what  information  exchange 
links  are  allowable,  and  what  information  transmission  capability  each  DM  is 
vested  with).  The  significance  of  information  structure  in  multiple  DM  problems 
also  manifests  itself  in  the  derivation  of  multimodel  strategies:  Model  simplific 
tion  through  singular  perturbations  or  aggregation  is  not  a  well-posed  procedure 
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unless  there  is  some  kind  of  a  matching  between  the  information  structures  of 
the  original  problem  and  the  simplified  version — no  such  inconsistencies  arise, 
however,  in  single  decision-maker  problems. 

Recent  years  have  witnessed  considerable  advances  in  our  understanding 
of  equilibrium  solutions  of  deterministic  and  stochastic  multi-person  decision 
problems,  and  in  particular  as  regards  the  Stackelberg  equilibrium  solution. 

A  class  of  such  Stackelberg  problems  which  were  long  thought  to  be  extremely 
challenging  have  recently  been  solved  using  indirect  methods,  for  both 
deterministic  and  stochastic  systems.  In  some  cases  it  has  been  shown  that 
the  Stackelberg  equilibrium  strategy  for  the  leader  forces  the  DM's  at  lower 
levels  of  hierarchy  to  a  team  behavior,  jointly  optimizing  the  leader’s 
performance  index,  even  though  they  may  each  have  different  goals  and  performance 
indices.  In  other  cases,  tight  performance  bounds  have  been  obtained  on  the 
leader's  cost  function,  which  are  achievable  by  implementable  policies. 


A  large  majority  of  this  work  on  multiple  DM  problems  pertains  to 
either  deterministic  systems  or  to  systems  with  uncertain  elements  which  have, 
a  complete  probabilistic  description — this  a  priori  information  being  known 
by  all  the  DM's  (the  latter  class  of  problems  are  also  known  as  stochastic 
dynamic  games).  Hence,  even  though  some  decentralization  of  dynamically 
acquired  information  has  been  allowed  for  in  the  general  formulation  of 
dynamic  games,  it  has  been  a  common  assumption  to  endow  every  DM  with  the 
common  (centralized)  a  priori  information  regarding  the  complete  statistical 
description  of  the  "primitive"  random  variables.  This,  however,  is  not 
always  a  realistic  assumption,  in  particular  when  the  decision  problem 
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involves  multiple  DM's  and  multiple  objectives.  A  more  realistic  formulation, 
in  most  cases,  would  involve  a  number  of  uncertain  parameters  which  are  either 
not  stochastic  or  they  are  stochastic  but  their  complete  statistical 
description  is  not  known  by  all  the  DM's. 

The  presence  of  unknown  (or  uncertain)  parameters  could  affect  the 
general  problem  formulation  in  basically  three  different  ways: 

i)  Through  the  objective  functions.  Here,  the  objective  function  of 
the  i'th  DM  may  not  be  known  completely  by  the  j'th  DM  (j^i) ,  with 
the  uncertainty  characterized  by  a  number  of  parameters  whose 
values  are  unknown  to  the  j'th  DM. 
ii)  Through  the  system  response.  The  evolution  of  the  decision 

process  may  depend  on  a  number  of  parameters  whose  values  are 
unknown  to  some  or  all  DM's.  [This  type  of  uncertainty  is  also 
applicable  to  stochastic  team  problems.] 
iii)  Through  the  measurements  made  by  the  DM' 3.  Here  either  the 

observation  scheme  or  the  statistics  of  ;ome  of  the  variables  in 
the  measurement  process  of  a  DM  (or  both)  may  not  be  known  to 
some  other  DM,  with  the  uncertainty  again  being  parameterized. 

(As  in  ii)  this  type  of  uncertainty  is  also  applicable  to 
stochastic  team  problems.] 

Multiple  DM  problems  with  the  types  0:  uncertainties  as  described 
above  can  be  treated  by  adopting  essentially  one  of  the  following  three 


approaches : 


Robustness  or  Minimum  Sensitivity  Approach.  Here  one  assumes  some 
nominal  values  for  Che  unknown  parameters,  determines  a  corresponding 
nominal  performance  for  the  system,  and  designs  decision  policies  which 
would  lead  to  minimum  performance  degradation  should  the  parameters 
vary  around  their  nominal  values.  The  resulting  decision  policies  are 
called  minimum  sensitivity  strategies ,  and  they  are  robust  in  a  certain 
neighborhood  of  the  nominal  values. 

Learning  Schemes.  In  this  approach  no  nominal  values  for  the  unknown 
parameters  will  be  available,  but  some  a  priori  statistics  may  be 
attached  to  these  parameters  by  the  DM's,  which  will  be  updated  in  a 
decentralized  manner  as  new  dynamic  information  is  acquired.  This  is 
akin  to  some  of  the  methodologies  developed  earlier  for  control  problems 
with  unknown  parameters  (such  as  identification,  parameter  estimation, 
and  adaptive  control — which  are  still  active  research  areas),  which  are, 
however,  not  applicable  to  multiple  DM  problems  because  the  rather 
intricate  interactions  of  multiple  DM's  render  any  central  learning 
scheme  infeasible. 

Minimax  Approach.  Here  no  nominal  values  are  available  for  the  unknown 
parameters,  but  they  are  known  to  belong  to  some  pre-specif ied  sets. 
Then,  the  objective  is  to  design  strategies  which  would  carry  optimality 
or  equilibrium  property  under  worst  possible  values  of  the  parameters  on 
these  sets.  Such  an  approach  entails  a  pessimistic  design  philosophy, 
and  is  applicable  mostly  to  decision  problems  with  a  common  objective 
functional  (i.e.,  team  problems).  In  multi-objective  problems,  the 
minimax  philosophy  is  somewhat  ambiguous  at  the  current  stage  of 


development,  since  what  may  seem  to  be  a  worst-case  design  for  one 
objective  functional  may  seem  to  lose  this  property  when  tested  against 
a  different  objective  functional.  However,  if  different  objective 
functionals  are  affected  by  different  sets  of  unknown  parameters, 
this  approach  would  still  be  applicable,  and  further  research  would 
definitely  be  needed  to  study  ramifications  of  such  a  line  of  approach 
in  these  problems. 

We  should  point  out  that  a  combination  of  any  two  or  all  three 
of  the  above  approaches  would  also  constitute  a  viable  approach  to  multi¬ 
person  decision  problems  with  unknwon  parameters,  which  should  be  studied  in 
proper  contexts  once  the  rudiments  of  a  theory  for  each  one  separately  is  laid 
down. 


6 


2.  RESEARCH  ACCOMPLISHMENTS 


In  our  proposal,  we  recognized  the  fact  that  the  class  of  multiple 


DM  problems  with  uncertain  parameters,  as  described  above,  are  still  in  their 


infancy,  in  particular  under  the  "Learning  Scheme"  approach.  In  view  of  this. 


we  proposed  to  initiate  original  fundamental  research  to  make  theoretical 


advances  in  this  field  and  to  design  implementable  decision  policies  which  carry 


both  the  learning  and  command  capabilities.  In  doing  this,  we  proposed  to 


adopt  the  general  framework  of  deterministic  and  stochastic  dynamic  games,  and 


to  study  these  problems  under  three  types  of  uncertainty  discussed  in  Section  1, 


and  under  different  solution  concepts  such  as  team-optimal,  Nash  equilibrium, 


and  Leader-Follower  (hierarchical). 


During  the  first  year  of  this  project,  we  have  addressed  several 


challenging  issues  in  this  context,  and  have  made  important  advances.  We 


briefly  outline  some  of  these  new  results  in  the  sequel;  full  details  can  be 


found  in  the  references  listed  in  Section  3. 


In  the  first  group  of  papers,  listed  in  Section  3  as  [Pll-[P3],  we 


have  adopted  the  first  (i.e.,  minimum  sensitivity)  approach  for  a  class  of 


decision  problems  which  displayed  the  first  type  of  uncertainty,  viz.  the 


case  of  one  of  the  DMs'  cost  function  depending  on  a  number  of  parameters 


whose  precise  values  are  known  by  him  but  not  by  other(s) .  In  [PI],  we  have 


presented  a  general  mathematical  formulation  and  a  method  of  solution  for 


stochastic  incentive  decision  problems,  using  concepts  and  tools  of  dynamic 


game  theory.  As  special  cases  of  the  general  formulation  we  have  considered 


four  different  classes  of  problems  which  differ  in  the  information  available 


v  .*-v. v.'.Vv’.lMV..'  v-v*v-\v*  s\-.v  .y*v y 


to  the  DM's,  their  objectives,  and  the  numbers  of  DM's  at  different  levels  of 
hierarchy.  The  fourth  class  we  considered  can  be  viewed  as  an  "exact  model 
matching"  problem  akin  to  the  one  arising  in  nonlinear  control.  In  the  paper, 
an  explicit  incentive  policy  has  been  obtained  for  the  DM  occupying  the  higher 
level  in  the  hierarchy,  which,  besides  solving  the  exact  matching  problem, 
carried  very  appealing  minimum  sensitivity  properties.  These  features  have 
also  been  demonstrated  in  [PI]  in  the  context  of  a  numerical  example.  The 
other  two  papers,  [P2]  and  [P3],  extend  these  results  to  more  general  models, 
with  the  former  devoted  to  decision  problems  defined  on  finite  dimensional 
spaces,  and  the  latter  dealing  with  nominally  team  problems  (i.e.,  decision 
problems  with  a  common  objective  functional)  defined  on  infinite  dimensional 
spaces.  Thus,  the  formulation  of  [P3]  covers  also  stochastic  control  problems 
defined  in  the  continuous  time,  with  multiple  decentralized  controllers,  and 
allowing  for  parametric  uncertainty  in  the  overall  modeling  from  the  viewpoint 
of  some  of  the  stations. 

The  fourth  paper  listed  in  Section  3,  [P4],  addresses  a  different 
class  of  problems,  wherein  the  uncertainty  is  of  the  second  and  third  types 
(see  Section  1)  and  the  general  approach  is  the  "learning  scheme";  here, 
all  three  solution  concepts,  viz.  Nash,  hierarchical,  and  Pareto-optimal,  are 
employed.  The  discussion  of  [P4]  includes  both  finite  and  infinite-state 
two-person  decision  models,  with  the  uncertainties  being  in  the  statistical 
description  of  the  random  variables  appearing  in  the  system  dynamics,  and  the 
measurements  of  the  two  DM's,  each  DM  developing  a  different  prior  on  these 
random  variables.  The  paper  develops  different  recursive  schemes  which 
involve  "learning"  in  the  policy  space  and  lead  to  policies  that  converge 


to  the  equilibrium  under  different  stipulations  on  the  information  structure 
of  the  problem.  We  have  also  analyzed  the  robustness  and  sensitivity  of  team 
optimal  solutions  to  deviations  in  the  perceptions  of  the  DM's  from  a  common 
stochastic  model,  and  have  shown  that  adoption  of  the  Nash  equilibrium  solution 
leads  to  well-posed  models,  whereas  the  other  two  solution  concepts  lead  to 
bifurcation  once  deviated  from  the  nominal  model.  An  important  by-product  of 
this  theoretical  analysis  is  a  recursive  relationship  which  leads  to  the  optimal 
solution  of  a  quadratic  stochastic  team  problem  with  decentralized  information, 
in  which  the  underlying  statistics  are  not  Gaussian.  There  seems  to  be 
considerable  potential  in  this  approach  to  decentralized  stochastic  control 
(team)  problems  with  non-Gaussian  statistics,  leading  to  a  converging  numerical 
scheme  for  the  derivation  of  optimal  policies — thus  solving  an  important  class 
of  stochastic  optimization  problems  which  have  remained  unsolved  until  today. 

The  fifth  paper,  [P5],  deals  with  a  fundamental  problem  in  dynamic 
game  theory,  which  is  development  of  a  theory  of  noncooperative  equilibrium 
for  decision  problems  whose  dynamics  are  described  by  higher  (than  one)  order 
difference  equations.  Even  though  such  extensions  are  trivial  in  the  case  of 
deterministic  optimal  control  problems  (simply  increase  the  dimension  of  the 
state  space  by  introducing  new  state  variables)  ,  this  is  quite  a  nontrivial  task 
in  game  problems.  The  paper  first  discusses  the  reasons  behind  the  intricacies 
involved,  and  then  presents  a  general  procedure  to  obtain  informationally  unique 
noncooperative  Nash  equilibria  in  the  presence  of  random  disturbances,  with  the 
theoretical  result  illustrated  by  a  numerical  example. 

The  sixth  paper.  [°6],  addresses  a  decentralized  large  scale  decision 
(team)  problem  with  N  DM's,  and  introduces  a  novel  procedure  to  obtain 
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suboptimal  policies  with  appealing  features.  It  utilizes  the  method  of  chained 
aggregation  to  decompose  the  overall  team  problem  into  (N+l)  subproblems: 
one  low  order  team  problem  with  a  centralized  information  structure  and  N 
decentralized  optimal  control  problems.  Accordingly,  the  control  of  each  DM 
is  decomposed  into  three  components:  a  decoupling  control  which  induces 
aggregation,  a  local  control  which  controls  the  subsystem  dynamics,  and  an 
aggregate  control  which  controls  the  dynamics  of  the  interconnection  variables. 
The  paper  also  establishes  the  robustness  of  this  composite  control  with  respect 
to  perturbations  in  the  system  dynamics  and  the  cost  functional. 
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